rr 






,6 



DOCUMENT RESUME 



ED 066 297 



SB 014 177 






t- ■ , 



AUTHOR * 
TITLE. 

PUB DATE 
NOTE 



EDRS PRI<*E 
DESCRIPTORS 



ABSTRACT 



Egelston, Richard L.; Egelston, Judy C.- 
Self- Evaluation and Performance on classroom 
Tests. 

Apr 72 . 

1 4p. ; Paper presented at the National Association for 
Research in. Science*Tpaching meeting, Chicago, 
Illinois, April 1972 , * 

1 • ' . ' 

MF-fO.65 HC— $3.29 

♦Academic Achievement; ♦JEvaluation; *Grade; 

Prediction; Secondary .'School. Science; ♦Self- 
Evaluation; Testing 



In an investigation of the .accuracy of v , * „ 

self- evaluation on test performance, 210 juriior high school science ^ 
students were asked to predict their scores before and after, taking 
each unit test. Absolute differences between the two predictions and 
actual scores were the random variables analyzed. ' Analysis of 
-variance and Markov chain analyses revealed significant differences . 
by achievement level, practice, , and. in rate of learned and perhaps 
should be incorporated .into the school curriculum. (Author/CP) * 






r 



y 



I 



y 

\ * 



/ , * 



c 



•) 



. * 



t 



S£ j'V /'/•/ ED 066297 



r 



^ U S DEPARTMENT OF HEALTH, 

EDUCATION A WELFARE ~ 
OFFICE OF EDUCATION 

% THIS DOCUMENT. HAS BEEN REPRO- 

DUC&P EXACTLY AS RECEIVEO FROM 

* TH E ftRSON OR; ORGANIZATION ORIG- 

INATING IT POINTS OF VIEW OR OPIN 
ICTNS STATEO 00 NOT NECESSARILY 

SELF-EVALUATION AND PERFORMANCE ON CLASSROOM TESTS REPRESENT OFFICIAL OFFICE OF EOU 

« CATION POSITION OR POLICY 



Richard L. Egelston and JudjrC;' Egelston . t 

State University of New York at Geneseo 

; \ 

-tr* •. 

X * • 

* f I 

When people leave the formal educational setting and enter the 

* * ' \ 

worlds of work and leisure, they are required to make many decisions 

i ^ 

» «r 

based upon their own abilities and Interests. Eacjlfc of the decisions 
requires some assessment about the 1 degree of success or enjoyment In 

the activity in which they art . td become engaged. Hopefully, the eval- 

. ' _ * 

uatlon of the potential activity will be rational and based upon a thor- 

• « f 

IT-* f 

ough knowledge of personal capabilities. However, self-evaluation 

A ■ 

processes may be difficult to learn and may need to be* developed and 
taught within the scdoqI curriculum. . • 

Research on' self-evaluation is meager, and that which has been 
done generally involves simple tasks not at all comparable to the com-' 

% 4 » 

plex .activities which individuals undertake in later life. Such studies 

« . * * 

have been -typified bytasks involving the persuit roter (Rotter, 1942) 

t , 

and number cancellation tasks (Anderson and Brandt , 1939) . While it 
is possible to construct good experimental controls with theae simple 

tasks, the meaningfulness of the tasks for the subjects is somewhat 

* • 

questionable, and any inferences drawn from these studies toward level 
of aspiration or self-evaluation are highly suspect. One meaningful 
task in the school setting which is repetitive enough for studying self- 
evaluation is that of * test taking, 

Mursteih (1965) found that neither hlgb-nor low achieving college 






1 



V 



E gels ton 



-i- 



•J 






J 



students changed their ♦predictions *of final ^grades as a result of' feed- * 

•> back on mid-semester examinations. This result was not confirmed by 
• - 

•» * ^ , 

Wolfe (in press) £jho found that collegd students became more accurate 

' \ r c • • 

predictors as. a result of mid-semester feedback'. » 

* , ' ' , * ' A 

In an attempt to determine the influence of sex and achievement 

on t'he ability to predict test pcores for college students, Sumne^ and 

» , 

Johnson' (1949) found discrepancy scores to be less for high "achieving , 
students than -for low achieving students. They also found tha£ females 
v of all quartile levels are more accurate predictors than males of com- 
. parable levels. • } 

•'With -secondary school students, Pickup and Anthony ^968) found 

that females \dio predicted higher scores than they received tended to 

. ■ • 

.. reduce subsequent predictions whiie males did ^iot. LoW achievers were 

* '.more likely to predict higher scores than. they received than did high 

^chievers , • ‘ 

* < • . 

Pennington's (1940) experiments oh college student^ indicated that 

t 

failure resulted in a lower level of aspiration, and success (passiftg 

> * ’ * \ 

with high grades) resulted<vin an upward swing in predicted scores on 
the following examination./ With fifth grade children, Anderson and 
4 Brandt (1939) found that poor students sAt goals consistently above 

* past performance,- and good students set goals consistently below past 

4 t 

performance. . 

Utilizing the* concepts involved in self-evaluation is a task of the. 

^ problem solving order as dencribed by Gagne (1965) , ahd involves a great 

deal of 1 formal reasonings Inhelder and Piaget (1958) have found that 
3 ^ v 

formal reasoning procedures typically tiegin at age 11 or -12 and buildf 



O 

ERIC 



<8 



¥ 



r 



■T. 



Egelabon . , , 



-3- 



up lo a plateau at about age 14 or 15. ..-Sinrie students of this age are 

f # / 

normally.j£«ya^in > the junior^high. school,, maturational differences were 

* t *• ' 

expected. » - • « 

Several hypotheses were examined in this study: .Whether or not 

1 v * m 

students in differing achievement quartiles were able to self-evaluate 



more accurately; whether experiencing the task of taking the test made 

f ' ■ J 

any difference in the ability to self-evaluate whether students in 

the differing quartiles would improve more and* at differing' rates with 



6 



practice, and whether sex made any difference in the ability to self- 

’ "5 * ‘ “ 4 / 

, , T' 

evaluate-. •• 

Accurate self-evaluation of pretest' performance, required the sub- 
r • . . 

Ject to recognize how much information he understood in comparison with 

. ■ * I . 

what he thought the teacher expected him to knq,w, Few cules were avail- 

• ! ' - 
I 

M^blti except for the style and quantity of class review prior to the 
test, and the practice of making predict ions,. Additional^ cues were 
available for the posttest predictions such as the number , difficulty 



and style of the items as well as the practice effects, 
attended to the cues, it was expected that their accuracy 



If subjects 
would increase 



from pretest to posttest prediction. Also, if the students attended 

to the cues^a practice effect would probably be demonstrated. 

# 

\ ' 

Method 

r Two hundred^ ten students in tfight general science classes and one 
earth ‘science class' from a rural Eastern New York secondary 'school were 

used as subjects. All students vepe in grades. 7-9. Classes varied in, 

' * * 

size from sixteenmo thirty-two students and wefe taught by two teachers, 

. ' . j 

* Within eacK^gr^de^the top one-fourth of the studdhts were homogeneously 



Y 







o 

ERIC 



3 ‘ 



i 



r 



Egelston 



- 4 - 



) -\ , 

* 

.grouped for. .enrichment courses and the remaining students were divided 

, • • , 
into two sections of comparable ability. 

• I 

t \ 

At the beginning of the school year the teachers explained to the 

> ■ » 

•' . 

students that on each unit test the students would be asked to predict 
„the percentage score they would get on the test immediately before 

(pretest prediction) and immediately after (posttest prediction) taking. 

1 ss 

• .-V % # 

the test. Separate slips of paper were stapled to the test for the ‘ 
pretest guess, .and wh'en filled out were torn off and collected. Space 

\ N was available on the test booklets for recording the posttest predic- 

_ ' 

tions. Students were told to base their predictions upon how well they 

i^iderstood the material and how difficult they thought the test would 

be (or was). Reminders were frequently given that the predictions 

• 9 ^ 0 

¥ 

would not affect actual grades in any way. Care was taken not to pro- ' 
vide feedback on the accuracy of prediction, although test results were 

V. 

returned as soon as,possible. * 

Absolute differences between each predicted sc^re and the actual 

. , * 

s' . 

‘scot for the test were used as random variables, 

7 F * .* y 

The number of tests given to each class ranged between eight and 

thirteen. All tests were constructed*' to be somewhat discriminatory 

» . ’ 
in nature, and perfect scores were rarely achieved. 

In the *few cases where a subject failed to make -a prediction, the 

mean prediction was used* and w.as .derived from all the pretest or post- 

test predicted scores the subject did make, 

da \ ■ . • 

Within each section subjects, were ranked from high to low on the 

. *’ f i 

final examination, Each section was then divide^ into four ^chlevement 

•I • v ■ 

levels called quartlles. Within each section, however, the quartilds 



O 

ERIC 



r 



■r 



, 4 



* j 



- 



Egelston • * • 



- 5 - 



were unequal In size due -to tied scores and the total section size not • 
being divisible by four. 

For each of the nine sections a three way nonorthogonal trend 

* 

analysis of variance was conducted. Factor A was the quartile level of 

* 

the subjects, factor B was the pretest and postteat (time) prediction,' 

• • » 

» 

and factor C (the trend factor) was the sequence of tests taken.’ 

■ Tests of hypotheses were performed in the following order: (a) 

A x C linear, quadratic and cubic trend interactions, (b) G linear, 
quadratic and cubic trends, and (c) ? A, B, and the C^-C^ contrast. The 

first hypothesis was tested In all six arrangements with the 'other hy- 

* - t 

potheses placed in a particular order. Whether or not significant in- 

teract ions were present, ^tests of ^ the main effects were made in all 

• . ■» 

possible, orders. In no case were the residual trend compoi^gnts or the 

, ’V - 

residual trend interaction components tested for significance.- The 
assumption was made that each successive practice trial was equally 

, * i 

effective in producing an increment in the ability to self-evaluate, 
although the time intervals between tests were unequal. 

\ • 

All hypotheses were tested at the five percent level of significance. 

According to Rotter (1942) and others, predicted scores are often 

4 • V . L 

dependent upon the actual performance of the previous tTrial. Since 

, , » 
achievement scores are somewhat related from teSt to test, it .is not 

unreasonable that predictions will be related to one another, and that 1 

discrepancy scores will be mediated by both achievement and previous 

predictions. The assumption wSs made that the discrepancy score for 

• * * 

trial t+r was conditional upon the discrepancy score for trial t, for 

a second analysis of pretest and posttest predictions. 

• . ; ■ : 



\ 



x 
, \ 



o 

ERIC 



5 



j - ■ 



Egelston . • • , •' 

. 6 . - . 

. * • 

1 

* . * i 

' * - l 

A vector of discrepancy scores was constructed for eadh student'x 
, ' 

and the data coded as conditional frequencies with a five ^>oint inter- \ 
val. The data for all students in each’ section were pooled and condi- 

y . • « 

* 

'tional .probability matrices (transition matrices) were derived. A 

Markov chain analysis provided limiting vectors of probabilities (toler- 

ance ■ .0005) for each section. (The limiting vector provides an esti- 

* 

j -i 

mate of the proportion of time the group will predict any category over 
an infinite tfumber of trials.,) The limiting vectors were converted to 
cumulative probability vectors and the pretest vector was compared with 
the posttest Stector via a Kolraogoro^r-Smirnov TVo Sample ‘Test. ' / 



Results . . 

* 

Significant differences were found among the quartiles (A) withfn 

seven of the nine sections and between the two times of prediction (B) 

» * » 1 

for three secti'pqs, No significant A x B interactions were found (see 
Table 1). Apparently students' of differing achievement levels within 




Insert Table *1 about here 

• y * 



the same section are not eqpal in the, ability to self-evaluate. Gener- 
ally the higher achieving students were more accurate- than the lo^er ‘ 

achieving students. -For many students, taking the test did not allow for 

* 

a more accurate self -appraisal (before .feedback) then before taking the 
* . ' , 
test. Furthermore, the improvement from pretest to posttest prediction 

remains relatively constant for all ability students. Table 2 summar- 
c ' • - 

izes the trend analyses for the nine sections •' . 



w 



T 



Egelston 



-7- 



* ’ 



O 

ERIC 



r 



Insert Table 2 about here 



\ 



The difference®, in trend components are relatively unimportant and 
. * ■ 

may be explained by two factors; - differences in degree of self-assurance 

I • 

in ( understanding ( the various units required, and the differential dif- 
ficulty of the tests. . • V 

. > i 

Four of the nine sections displayed significant quartile by test 
interactions indicating differing rates of Improvement following prac-‘ 
tlce. Each of the four sections was composed of heterogeneously 

. ^ % w 

grouped students and contained a larger' range of ability than the ho- 

4 

mogeneously sectioned students. If the sections had been chosen with-, * 

• • - 

out regard to ability, it is likely that more sections would have pro- 

a 

dqced significant interactions. It might well be that differences in 

• r. 

ability need to be quite’ large before differences in the rate of im- 

< J , * 

provement, will be demonstrated within a classroom. 

.Within the same trend analyses, contrasts of the last predictions 



with the first predictions were conducted, and^ found to be more accu- 
rate at the end of the year in seven of nine sections. Thus, practice 
tends to improve accuracy of self-evaluation. 

A 

Two way analyses of variance (sex by time of prediction) were 

performed after pooling data across tests and quartiles within each 
'* ) > 

section. In no Instance was a significant -difference found between 

males and females. - y J} 

# 

For the additional ptetest-posttest analysis the cumulative pro- 
portion vectors derived from the Markov chain analysis are illustrated 

* * , 



. f 



• • f 



